feat(sandbox,module): ArgoEphemeralRunner + exec_env: ephemeral (infra-admin P3c PR9/12)#849
Merged
Conversation
…a-admin P3c PR9/12) Adds ArgoEphemeralRunner, a sandbox.SandboxRunner that executes a one-off command as an Argo Workflow on Kubernetes, wired as exec_env: ephemeral on step.sandbox_exec. Task 17 — ArgoEphemeralRunner (module/argo_ephemeral_runner.go): - argoSubmitter interface (SubmitWorkflow/WorkflowStatus/WorkflowLogs, all ctx-aware) satisfied by *ArgoWorkflowsModule; tests inject a fake. - buildSpec emits a single-container Workflow (entrypoint main); workflow names derive from a module-global atomic counter (no time/rand). - Status->exit-code mapping: Succeeded->0, Failed/Error->1 (documented Argo limitation: no per-container exit code from the status API). - secret:// env refs passed through UNRESOLVED (k8s secretKeyRef resolves at pod launch, ADR 0017). - Poll loop selects on ctx.Done() AND threads ctx into every submitter call so an in-flight HTTP request aborts on cancel (not just between ticks). - TTL: spec.TTLSecondsAfterFinished (default 300s) -> ttlStrategy. secondsAfterCompletion so the Argo controller GCs completed runs (no extra API call, prevents namespace accumulation). - Poll interval is injectable (default 2s; tests use 1ms). Task 18 — factory wiring (module/execenv_factory.go): - exec_env: ephemeral -> resolveEphemeralRunner: explicit argo_module name, or auto-detect the sole *ArgoWorkflowsModule (clear error on 0 or >1). - argo_module config threaded from step.sandbox_exec; resolveSandboxRunner takes a plain argoModuleName string (not variadic). - exec_env schema options -> [local-docker, ephemeral] + argo_module field in step_schema_builtins.go and module_schema.go; golden regenerated. ctx propagation (review fix): threaded ctx through (*ArgoWorkflowsModule). SubmitWorkflow/WorkflowStatus/WorkflowLogs/DeleteWorkflow/ListWorkflows -> the argoBackend methods -> doRequest, so the real backend honors cancellation/ deadline mid-HTTP. All step.argo_* Execute callers now pass their ctx. Tests: status->exit-code, spec shape, TTL (spec + rendered CRD), secret passthrough, ctx-cancel between ticks, ctx-cancel during in-flight status, WorkflowStatus error propagation, submit error, monotonic names, default poll-interval fallback; factory: ephemeral with/without/ambiguous argo module. Gate: golangci-lint cache clean + GOWORK=off go build ./... + go test ./... all exit 0; golangci-lint run 0 issues. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new Kubernetes/Argo-backed “ephemeral” execution environment for step.sandbox_exec, enabling one-off command execution as single-container Argo Workflows while reusing the existing argo.workflows module’s direct API and improving cancellation behavior via context propagation.
Changes:
- Introduces
module.ArgoEphemeralRunner(implementssandbox.SandboxRunner) to submit/poll/log one-off Argo Workflows with TTL-based auto-cleanup. - Wires
exec_env: ephemeral(+ optionalargo_module) through schema + factory resolution to select the appropriate Argo module instance. - Threads
context.Contextthroughargo.workflowssubmit/status/logs/delete/list APIs and updates step callers + tests accordingly.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/testdata/editor-schemas.golden.json | Updates editor schema output for exec_env options and new argo_module field. |
| schema/step_schema_builtins.go | Expands step.sandbox_exec schema to include ephemeral + argo_module and updates field descriptions. |
| schema/module_schema.go | Adds ephemeral exec_env option and argo_module field to module schema for sandbox exec config. |
| module/pipeline_step_sandbox_exec.go | Plumbs argo_module through step.sandbox_exec into sandbox runner resolution. |
| module/pipeline_step_argo.go | Updates Argo steps to pass real ctx into Argo module methods. |
| module/execenv_factory.go | Implements exec_env: ephemeral resolution to ArgoEphemeralRunner (explicit module name or auto-detect). |
| module/execenv_factory_test.go | Expands coverage for ephemeral exec env module selection and error paths. |
| module/argo_workflows.go | Adds TTL support in CRD rendering and updates Argo workflow-run operations to accept ctx. |
| module/argo_workflows_test.go | Updates tests for new Argo module method signatures requiring ctx. |
| module/argo_ephemeral_runner.go | New runner implementation: submit → poll status → fetch logs → map phase to exit code; sets TTLStrategy. |
| module/argo_ephemeral_runner_test.go | New unit tests covering status mapping, TTL rendering, secret passthrough, and ctx cancellation behaviors. |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…ixes (Copilot) - ArgoEphemeralRunner best-effort DeleteWorkflow on ctx cancellation (fresh short ctx) so a cancelled/timed-out ephemeral step doesn't leave the Argo workflow running in-cluster until TTL GC. Added to the argoSubmitter interface + both test fakes; cancel test asserts deleteCalled. - step.sandbox_exec schema description generalized (not just Docker — also remote/ephemeral via exec_env), in both schema registries + golden. - exec_env field description drops the internal PR ref; notes named remote runners are valid dynamic values. - logs-retrieval-failure comment matches the actual warning-line behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR9/12 — ArgoEphemeralRunner + exec_env: ephemeral (infra-admin Phase 3c)
Locked plan Tasks 17–18; ADR 0020. The last workflow-core PR: runs a one-off
step.sandbox_execcommand as a single-container Argo Workflow on k8s. Reuses the existing Argo module's direct API (SubmitWorkflow/WorkflowStatus/WorkflowLogs) — NOTTranslatePipelineToArgo; runner lives inmodule(no sandbox→module cycle).Changes
module.ArgoEphemeralRunner(sandbox.SandboxRunner): builds a single-containerArgoWorkflowSpec, submits, polls status to terminal, fetches logs →ExecResult. Argo exposes status not a container exit code →Succeeded→0,Failed/Error→nonzero (documented limitation).secret://env refs pass through UNRESOLVED (ADR 0017; k8ssecretKeyRef/pod creds in prod). Workflow name from an atomic counter (no time/rand). Poll interval injectable (1ms in tests).exec_env: ephemeralwired in the factory: resolves the argo module byargo_moduleconfig name, or the sole*ArgoWorkflowsModule(clear error on 0/ambiguous). Schema (module_schema.go+step_schema_builtins.go) gainsephemeraloption +argo_modulefield; golden regenerated.ttlStrategy.secondsAfterCompletion(300s) so the Argo controller GCs completed runs (no namespace accumulation).Review notes (resolved)
(*ArgoWorkflowsModule).SubmitWorkflow/WorkflowStatus/WorkflowLogs/DeleteWorkflow/ListWorkflows→ the backend →doRequest(ctx)(wascontext.Background()), so a cancelled/timed-out ephemeral exec aborts the in-flight HTTP call promptly instead of waiting the 30s client timeout. All existingstep.argo_*callers updated to pass their real ctx. Test: cancel during an in-flight status call → promptcontext.Canceled.Verified: build + full
go test ./...exit 0 (151 ok); fullgolangci-lint v2.12.00 issues.🤖 Generated with Claude Code